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Abstract: The continuous-discrete filtering problem requires the solution of a partial dif- 
ferential equation known as the Fokker-Planck-Kolmogorov forward equation (FPKfe). In 
this paper, it is pointed out that for a state model with an affine, linear drift and state- 
independent diffusion matrix the fundamental solution can be obtained using only linear 
algebra techniques. In particular, no differential equations need to be solved. Furthermore, 
there are no restrictions on the size of the time step size, or on the measurement model. 
Also discussed are important computational aspects that are crucial for potential real-time 
implementation for higher-dimensional problems. The solution is universal in the sense that 
the initial distribution may be arbitrary. 
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1. Introduction 

The problem of continuous-discrete (continuous-continuous) filtering is to estimate the state 
that is described by a continuous-time stochastic process from the observations of a related 
discrete-time (continuous-time) stochastic process called the measurement process. The com- 
plete solution of the filtering problem, in the Bayesian sense, is given by the conditional 
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probability density function. Given the conditional probability density we can define opti- 
mality under any criteria. Usually, the conditional mean, which is the minimum mean-squares 
estimate, is studied. A solution is said to be universal if the initial distribution is arbitrary 

(see, for example, [1]). 

Continuous-discrete filtering requires the solution of a partial differential equation known 
as the Fokker-Planck-Kolmogorov forward equation (FPKfe) [2] . In general, such problems are 
often solved via discretization of the PDE, such as using the method of finite differences, [3,4], 
spectral methods [5] and finite element methods [6] . The numerical solution of a PDE is non- 
trivial. In particular, there are severe time step size and grid spacing restrictions (especially for 
high accuracy solutions) for stability, accuracy, and consistency of the numerical method. In 
addition, a plain implementation places very large computational and memory requirements, 
especially for higher dimensional problems. It is therefore desirable to be able to exactly solve 
for the fundamental solution of the FPKfe, in terms of which the FPKfe can be solved. 

In this paper it is pointed out that the fundamental solution of the FPKfe for the affine, 
linear state model with state-independent diffusion matrix, a special case of the Yau filter 
(discussed below), can be computed using only linear algebra methods. In particular, the 
fundamental solution for such a model can be computed exactly for an arbitrary time step 
size without having to solve a differential equation. It is also noted that the computational 
requirements are considerably less than that suggested by the size of the grid space. In partic- 
ular, the conditional probability density can be computed efficiently (especially, concerning 
memory requirements) and rapidly with the use of sparse multilinear data structures. In 
many applications, such as target tracking, the state model is linear and the measurement 
model is nonlinear. Therefore, the results in the paper are applicable to a large number of 
practical problems. 

The Yau filter is defined to be one where state model drift a linear plus a gradient term 
satisfying the Benes condition and with a time- and state- independent diffusion matrix [7]. 
In continous-continuous filtering theory, the Yau filtering system is defined as one with an 
additional restriction that the measurement process is a continuous-time stochastic process 
with linear drift. The Yau filtering system plays a fundamental role in continuous- continuous 
filtering. The reason is that it has been shown that a finite-dimensional filter of maximal 
rank has to be a Yau filter [7]. In light of its importance, some real-time implementable 
solutions for the Yau filtering system have been investigated. In [8,9], the solution of the Yau 
equation has been shown to be equivalent to a solution of a system of linear ODEs. The case 
of nonlinear observations has also been investigated in some papers. In particular, it is shown 
in [10] that some nonlinear observation models can also be solved via ODEs. Finally, in [11], 
a solution is presented in terms of a power series. 

In this paper, the special case of the continuous-discrete Yau filter, namely one with an 
affine, linear state model drift and state-independent (but possibly time-dependent) diffusion 
matrix, is studied. Note that the measurement model in the continuous-discrete Yau case 
can be arbitrary, and so a more general filtering problem is solved than for the continuous- 
continuous Yau case. In a following paper, it is shown that the fundamental solution of the 
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FPKfe for another special case of the Yau filter, namely one with a gradient drift satisfying a 
certain condition (and also known as the Benes model drift), can also be solved exactly using 
such elementary methods. 

It is noted that the methods developed for solving the continuous-continuous filtering 
problem quoted in the papers above can be used to solve the prediction part of the continuous- 
discrete filtering problem. However, they require solution of ODEs, or a power series expres- 
sion, rather than the closed form expressions derived here. 

The relevant aspects of filtering theory are reviewed in Section ^. The fundamental 
solutions are derived for the additive noise case in Section ^ and ^. Some implementational 
aspects are discussed in Section ^ and related comments on computational aspects of discrete- 
discrete filtering are presented in Section |6|. An example is presented in Section ^. In Appendix 
the formula for the state transition matrix for the general time-dependent case is reviewed. 
In Appendix p|, the linear manoeuvering target tracking models reviewed in [12] are explicitly 
solved. Some of the results are well-known and have been derived elsewhere, sometimes 
presented in a different form. Here use is made of the elegant method of [13] that enables a 
simple and systematic derivation of explicit formulas for arbitrary affine, linear models up to 
four-dimensional state space problems. 



2. Review of Continuous-Discrete Filtering Theory 

2.1 Langevin Equation, the FPKfe and its Fundamental Solution 

The general continuous-time state model is described by the following stochastic differential 
equation (SDE): 



dx(t) = f{x(t),t)dt + e(x(t), i)dv(t), x(to 



(2.1) 



Here x(t) and f(x{t),t) are n— dimensional column vectors, the diffusion vielbein e{x(t),t) 
is an n X p matrix and v(t) is a p— dimensional column vector. The noise process v(t) is 
assumed to be Brownian with covariance Q{t) and the quantity g = eQe^ is termed the 



diffusion matrix. All functions are assumed to be sufficiently smooth. Equation 2.1 is also 
referred to as the Langevin equation. It is interpreted in the Ito sense. 

Let cro{x) be the initial probability distribution of the state process. Then, the evolution 
of the probability distribution of the state process described the Langevin equation, p{t,x), 
is described by the FPKfe, i.e.. 



dp 

m 



it,x) 



i=l 

p{tQ,x) = ao{x). 



dx 



(2.2) 
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Let t" > i', and consider the following PDE: 
dP ^ Q 1 

— {t,x\t\x') = - E ^ [h{x,t)P(t,x\t\x')\ + - E [g^j{t^x)P{t,x\t',x'))\ , 

i=l * i,j=l * 

[ P{t',x"\t',x') = 6''{x-xo). 

(2.3) 

The quantity P{t, x\t' , x') is known as the fundamental solution of the FPKfe. In this instance, 
the physical interpretation is that it is the transition probability density. 

From the fundamental solution one can compute the probability at a later time for an 
arbitrary initial condition as follows^: 

pit", x") = j P{t", x"\t', x')p{t', x') {(Tx'} . (2.4) 

Therefore, in order to solve the FPKfe it is sufficient to solve for the transition probability den- 
sity P{t, x\t', x'). Note that this solution is universal in the sense that the initial distribution 
can be arbitrary. 

2.2 Continuous-Discrete Filtering 

In ths paper, it is assumed that the measurement model is described by the following discrete- 
time stochastic process 

y{tk) = h{x{tk),tk,\M{tk)), k = l,2,..., tk>to, (2.5) 

where y{t) G W^^^,h € M™^-'^, and the noise process vj{t) is assumed to be a white noise 
process. Note that y{to) = 0. It is assumed that piy(tk)\x(tk)) is known. 

Then, the universal continuous-discrete filtering problem can be solved as follows. Let 
the initial distribution be ao{x) and let the measurements be collected at time instants 
ti,t2, ■ ■ ■ ,tk, We use the notation Y{t) = {y{ti) : to < ti < r}. Prior to incorporat- 
ing the measurements, the state evolves according to the FPKfe, i.e., 

p(ti,xi|F(to)) = J Piti,x\to,xo)p{to,xo){d''xo}. (2.6) 

This is the prediction step. 
According to Bayes' rule 

Now p{ti,x\Y{ti)) = p{ti, x\y{ti), Y{tQ)) and since measurement noise is white it follows that 
p{y(ti)\x,Y{to)) = p{y(ti)\x). Hence, the 'corrected' conditional density at time ti is 

^\Y(f \) - p{yiti)\x)p{ti,x\Y{to)) 



^In this paper, all integrals are assumed to be from —oo to +oo, unless otherwise specified. 
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This is then the initial condition of the FPKfe for the next prediction step which results in 
p{t2,x\Y{ti)) = J P{t2,x\ti,xi)p{ti,xi\Y{ti)){(r-xo} , (Prediction Step), 

and so on. 

3. Fundamental Solution I: Additive Noise 

In this section, the following general affine, linear state model with additive noise is considered: 

dx{t) = {F{t)x{t) + l(t))dt + e(t)dvi{t), i = l,...,n. (3.1) 

It is assumed that F{t) commutes at different times, i.e., 

[F{t),F{t')]^F{t)F{t')-F{t')Fit), (3.2) 
= 0. 

The case when F{t) does not commute with itself at different times is discussed in Appendix 
Also, the matrix notation is used here. 

The state model can be written as the following Langevin equation 

fly fl\i 

^(t) = (F(t)x(t) + /(t))+e(t)-(t), (3.3) 

= {F{t)xit) + l{t)) + e{t)uit), 

where v{t) is (5— correlated and Gaussian distributed as follows: 

{i^i{t)) = 0, (3.4) 
{ui{t)i^j{t')) = Qij{t)6{t - t'), Qij{t) = Qji. 

The formal solution of Equation is 

x(t) = U{t, to)x{to) + / U{t, t) [/(r) + e(r)i/(r)]dr, (3.5) 

J to 

where the state transition matrix is the solution of the following equation: 

f(Mo) = F(0^(Mo), ^3_g^ 
U{to,to) = I. 
This can be verified as follows. The Leibniz rule is 

d f'''-^^ f^^'-'^ a$ dh da 
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where $(^, z), a(z), 6(2;) arc assumed to be sufficiently smooth functions of their arguments. 
Prom the Leibniz rule it therefore follows that 



(t) = F{t)U{t,to)x{to) + f F{t)U{t,r) [1{t) + e{T)u{T)]dT + +U{t,t){l{t) + e{t)u{t)), 

J to 

(3.8) 

= F{t) U{t,to)x{to)+ I U{t,T)[l(T) + e{T)u{T)]dT 

Jto 

= {F{t)x{t) + l{t)) + e{t)u{t). 



+ l{t) + e{t)iy{t), 



Also, U {t, to) satisfies the semi-group property: 

U{t2,to) = U{t2,ti)U{h,to), (3.9) 

which immediately implies that 

U{t,to) = U-\to,t). (3.10) 
Since the matrix F{t) commutes at all times 

C/(i,to) = e4^(^)''^ (3.11) 

If the matrix F is time-independent 

[/(t,to) =e^(*-*o). (3.12) 

Since a linear transformation of Gaussian variables is also Gaussian, x{t) is also a Gaussian 
random variable. Therefore, it is completely characterized by the mean vector and covariance 
matrix. 

The mean vector is 

fi{t,to) = {x{t)) , (3.13) 

= U{t,to)x{to) + l{t), l{t)= f U{t,to)l{t)dt, 

Jto 

and the covariance matrix is 

E(t,io) = {[xit] - iiit)] [xit] - fi{t)f) , (3.14) 

ft ft 



U{t,to)a{to)U^{t,to)+ I j U{t,n)e{ri)Q{n)5{n-T2)e^{T2)U^{t,r2)dndT2, 

J to Jto 

U{t, to)a{to)U^{t, to) + f U{t, T)g{T)U'^{t, T)dT. 

Jto 
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Combining the results for nit^tQ) and (T(t,to)) the fundamental solution is 

P(t,x|to,xo) = ^ , exp(-;^[x-^(t,to)]'^ [S-^(t,to)] [x-/i(t,to)] ) • 

A/(27r)"detcr(t,to) V ^ / 

(3.15) 

Finally, observe that 
dYi d /"* 

— {t) = F{t) [U{t,tQ)T.{to)U^{tM] + [U{t,to)ntQ)U^{t,to)\ F^{t) + -J U{t,T)g{T)U^{t,T)dT, 

(3.16) 

= F{t) [U{t,to)J:{to)U^{t,to)] + [U{t,toMto)U'^{t,to)] F^{t) 

+ U{t,t)g{t)U{t,t) + F{t) f [U{t,T)G{T)U^{t,T)]dT+ [ [U{t,T)g{T)U^{t,T)]F^{T)dT, 

J to J to 



F(t)S(t) + S(t)F^(t)+5(t) 



where the Leibniz rule (Equation |3.7D has been used in the second step. This result will be 
used in the following section. 

4. Fundamental Solution II: State-Independent Diffusion Matrix 

In Section]^ the fundamental solution was derived for the time- independent affine, linear state 
model with additive noise. In this section, it is pointed out that a similar result follows if the 
noise is multiplicative but with state dependent diffusion matrix. This is a straightforward 
generalization of the result derived in [14]; it also assumes that the diffusion matrix is time- 
independent. 

The state model considered here is 

dx{t) = {F{t)x{t)) + e{x{t),t)d\i{t), eQe^ = gij{t). (4.1) 

The state model is assumed to be affine and linear and possibly explicitly time dependent. 
The diffusion matrix is assumed to be independent of state, but possibly (explicitly) time 
dependent. 

Observe that this includes the case studied in the previous section. Since the diffusion 
vielbein is state dependent (or the state model noise is multiplicative), the methods used in 
the previous section cannot be applied to this case. This model is also more general than the 
Yau filter case where no explicit time dependence is assumed. 

The FPKfe for the transition probability density P{t,x\t' ,x') is 

dP d 1 " d^P 

{t,x\to,xo) = - X] 7^ {{Fij{t)xj+k{t))P{t,x\t',x')) +-Y1 9ij{t)j^-^{t,x\t',x'), 



P{t, x\t, xo) = 6'^{x - xo). 



. . -I I ^ . . . UJylUJyJ 



(4.2) 
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Let P be the Fourier transform of P with respect to x, i.e., 

P(t,x|to,xo) = -^ j e'^=^''^^^P{t,k\to,xo)d''k. 



Since 



dP 1 f dP 

XjP{t,x\tQ,XQ) = J e'^i=^''''''xjP{t,k\to,xo)d''k, 



(4.3) 



(4.4) 



(2 



:7r)" 7 



P{t,k\to,xo)d''k, 



(t, A;|to,a;o)c?'*A;, 



dP 

— {t,x\to,xo) 



(27r) 



iE?=ifei-ifc.p(i,fc|to,xo), 



^ E 5^.(*)^^(*'^l*o,^o)) = /" e^'^r-i'^i'-.'i ^ gij{t)kikjP{t,k\to,xo)dJ'k, 

i,j=l ' ^ ij=l 



it follows that 



dP 

-^{t,k\to,xo) 



n d ^ \ 1 ^ 

J2 Fij{t)kig^ + ^k{t)ki P{t,k\t',x') - 2 9ijit)kikjP{t,k\t',x') 

.,j=l ^ i=l J i,j=l 



P(t, k\t, xq) = e 



(4.5) 



Since P{t,x\t' ,x') is Gaussian, so is P{t,k\to,xo) and we can choose the following ansatz for 
P{t,k\to,xo): 



P{t,k\tQ,XQ) = e 



(4.6) 



Without loss of generality, Sjj = Ejj. Note that if there is no explicit time dependence, the 
argument of /i and S is the difference between the times, i.e., t — tg. 
Therefore 

(n n n n n 1 ^ \ 

^ kifii - 2 X! kikjtij + i ^ Fij{t)kifij + ^ kki + ^ F^kiT^jiki + 2 X! 9ij{t)hkj j = 0. 
i=l i,j=l i,j=l i=l «J,(=1 2j=l / 



This implies that jii and Sjj obey the following ODEs: 

n 

fii{t,to) = ^Fij{t)iij{t,to) + li{t,to), 
l^i{to,to) = x{to), 



(4.7) 



(4.8) 
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and 

' n n 

tij{t,to) = y2^ii{t,to)Fij{t) + y2Fii{t)J:ij{t,to) + gij{t), 

1=1 1=1 (4-9) 

^^ 5]jj(to, to) = Ti{to). 

From the discussion in the previous section, the solution of these first-order ODEs are 

Hi{t,to) = Uij{t,t')x{tQ)+ I U{t,to)l{t)dt, (4.10) 

Jto 

and 



^^J{t, to) = U{t, to)^{to)U^{t, to) + [ U{t, T)G{T)U^{t, T)dT', (4.11) 

Jto 

so that 

P{t, x\to,xo) = ^^27r)"detS(t to) ~ [^~^*^*' ~ • 

(4.12) 

Thus, the expression is the same as derived earher even for the more general case (i.e., although 
constant diffusion vielbein implies constant diffusion matrix (for constant Q, constant difusion 
matrix does not imply constant diffusion vielbein). 

5. Practical Implementation 

In this section, some practical implementational aspects are discussed. Specifically, a com- 
putationally efficient filtering algorithm based the results derived in the previous sections is 



presented in Section 5.1. Some additional aspects are discussed in Section 5.2. 

It is important to note that the transition probability is given in terms of an exponential 
function. This implies that P{t" ,x"\t' ,x') is non-negligible only in a very small region, or the 
transition probability density tensor is sparse. The sparsity property is crucial for storage 
and computation speed. 

5.1 Sparse Kernel Grid Filtering Algorithm 

The implementation of the continous-discrete filtering solution exploiting the sparsity prop- 
erty of the transtition probability density, or kernel, is thus straightforward. 

1. Precompute the transition probability density is given by {t" > t') 

P{t",x"\t',x')= ^ ^ (5.1) 

V(27r)'*detS(t",t') ^ ^ 

X^^P E [^^-^^^(t",f)] [^-\t",t%^ K'-/i,(t",0 
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2. Threshold the transition probabihty density and save it as a sparse tensor. That is, zero 
the entries corresponding to exponent larger than a certain (user-specified) threshold. 

3. At time 

(a) The prediction step is accomplished by 

p{tk\Y{tk-i)) = J P{tk, x\tk-i, x')p{tk-i,x'\Y{tk-i)) {rx'} . (5.2) 

Note that p{to\Y{to)) is simply the initial condition p{to,xo). 

(b) The measurement at time are incorporated in the correction step via 

= r<f;>gf,;-"^f';;-'>i, m) . (5.3) 

J p{y{tk)\Opiik,^\Y{tk-i)) 

This algorithm is referred to as the sparse grid filtering (SGF) algorithm. The thresh- 
olding step is crucial for real-time implementation. 

Note that the precomputation step assumes that the grid spacing and time intervals 
between measurements are known. 

In many applications, the measurement model is described by an additive Gaussian noise 
model, i.e., 

y{tk) = h{x{tk),tk)+w{tk), A; = 1,2,..., tk>to, (5.4) 
with w(t) ~ N{0, R{t)). Then, at observation time tk, the conditional density is given by 

f+ \vu \\ _ p{y{tk)\x)pitk,x\Y{tk-i)) 

" ' ^ " iPiy{tkmpitk,mtk-i)){d-^y ^'-'^ 

where p{y{tk)\x) is given by 

P{yitk)\x) = —— — , , exp \ -^iy{tk) - h{x{tk),tk)fiR{tk)y^{y{tk) - h{x{tk),tk)) \ , 

(5.6) 

and p{tk,x\Y{tk-i)) is given by 

p{tk,x\Y{tk-i)) = J P{tk,x\tk-i,Xk-i)p{tk-i,Xk-i\Y{tk-i)) {(Txk-i} . (5.7) 
5.2 Additional Remarks 

Observe also that the fundamental solution has a simple and clear physical interpretation. 
Specifically, when the signal model noise is small the transition probability is significant only 
near trajectories satisfying the noiseless equation. The noise variance quantifies the extent to 
which the state may deviate from the noiseless trajectory. 
The following additional observations can be made : 
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1. It is noted that some of the results derived in this paper are well-known and used in 
EKF. In particular, the result of Section ^ is used to derive the discrete-time version of 
the continuous-time state model (see, for instance, [15]). 

However, the use of these results is very different from that in EKF. Specifically, the 
EKF linearizes the nonlinear measurement model and propagates the conditional mean 
and variance, assuming the prior conditional probablility density is Gaussian. Thus, 
the EKF fails if the nonlinearity is not benign, or if the Gaussian approximation for the 
prior is not valid (e.g., multi- modal) . In the grid-based filtering, the standard deviation 
computed from the conditional probability density is a reliable measure of the filter 
performance. This is not the case of suboptimal methods like the EKF. 

In other words, the SGF uses the exact expression for the fundamental solution that is 
given by a Gaussian function, while the EKF make the unjustified approximation that 
the conditional probability density at any time is Gaussian. 

2. The conditional probability density calculated at grid points is exact (ignoring roundff 
errors) at those grid points. So, an interpolated version of the fundamental solution at 
coarser grid is close to the actual value. This suggests that a practical way of verifying 
the validity of the approximation is to note if the variation in the statistics with grid 
spacing, such as the conditional mean, is minimal. 

Observe that the time step size may be arbitrary. This means that the optimal solution 
(in the sense of being the correct solution) can be computed using this using a grid of 
sufficiently small grid spacing. However, the conditional state estimation will not be as 
good for large time steps, a fundamental limitation due to large measurement sample 
interval. 

3. In this paper, the correction step was carried out using a uniform grid. It is clear that 
a much faster approach for higher dimensional problems would be to use Monte Carlo 
integration methods. 

4. The prediction step computation can be speeded up considerably by restricting calcula- 
tion only in areas with significant conditional probability mass (or more accurately, in 
the union of the region of significant probability mass of p(y(tfc)|x) and p(x\Y{tf^_i))). 

5. Although it is possible to compute the transition probability density off-line, it may 
be more appropriate to compute it on-line computing only those terms where initial 
points are in the region where the density is significant. It is also then possible to use 
an 'adaptive grid', both in resolution and location/domain. Observe that the one-step 
approximation of the path integral can be stored more compactly as it is Gaussian. 
Note that when the noise is small, grid spacing needs to be finer in order to adequately 
sample the conditional probability density. 
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6. Although the exact analytical expression of the conditional probability density is un- 
known, the accuracy of the grid-based method can be made very high and robust with 
small enough grid spacing. This situation is similar to the case of numerical solution of 
partial differential equations, where the exact solution is not known, but high accuracy 
numerical solutions (even machine precision) can still be obtained. Thus, for practical 
purposes "numerically exact" solutions obtained with a fine grid are optimal. Note that 
in the case of the PDEs, even when the exact solution is known, it often does not yield 
accurate numerical solutions. 

6. Comments on Discrete-Discrete Sparse Kernel Grid Filtering and Par- 
ticle Filtering 

6.1 Discrete-Discrete Filtering 

In Section ^, the continuous-time state process has been converted to a discrete-time state 
sequence. This result is simple and exact because the model is linear. This is not possible for 
a general nonlinear state model, i.e., there is no such general formula relating the continuous- 
time process with its equivalent discrete-time sequence. In fact, a simple, nonlinear state 
process is not equivalent to a simple nonlinear state sequence. 

Observe that a discrete-time sequence description is more general than a continuous-time 
process. This is because less is assumed in the specification of the state sequence (for example, 
no assumption on continuity, differentiability are needed). However, the fundamental dynam- 
ical laws of classical physics are expressed in terms of continuous-time ODEs. Note that 
a simple, nonlinear dynamical model may not result in a "simple" discrete-time dynamical 
model. 

Often, a nonlinear discrete-time state process, or sequence, is studied. This can be either 
as a simple (e.g., Euler) approximation of a continuous-time state model, or directly via 
other means. In this section, the state sequence model is assumed to be given. Thus, one is 
led to consider the following general discrete-discrete filtering problem where the signal and 
measurement sequences are as follows: 

Xfc = /(xfc_i,Vfe_i, A;), 

Here, / : M"^ x M"^ — > M"^ is a nonlinear function of the state Xk_i, {\ik_i,k G N} is an 
independent, identically distributed (i.i.d) process noise sequence, and dimensions 
of the state and process noise vectors. Similarly, /i^, : M"^ x R"" — > W^y is a nonlinear 
function, {ni^,k € N} is an i.i.d. measurement noise sequance, and dimensions of 

the measurement and measurement noise vectors. 

The universal discrete-discrete filtering problem is to evolve the conditional probability 
distribution of the state at time k, based on the set of all available measurements y^.;., i.e., 
p{xh\zi-k)- The recursive solution is well-known and similar to that discussed in Section ^. 



(6.1) 



Thus, if p{xk-i\yi:k-i) is known (note p{xo\yo) = p{xo) is the prior, or initial pdf), then 



Note that p{xk\xk-i, yi-.k-i) = p{^k\^k-i) due to the first-order Markov property of the state 
process. 

Observe that "discrete" here refers to the time, not the state; the states take values in 
the continuum (M"^). 

This solution assumes that p{xk\xk-i) and p{yk\xk) are known. Of course, they are 
defined by the state and measurement processes. In a practical situation, a closed form 
expression for these quantities is desired. This is indeed possible for several cases of practical 
interest, such as additive Gaussian noise. 

Of course, no closed form expression for the conditional probability density is known 
for an arbitrary initial distribution. Also, such a closed form solution, if it exists, may be 
unsuitable for an accurate numerical evaluation. 

6.2 Sparse Kernel Grid Filtering 

It is usually stated that grid-based methods are computationally prohibitive when dealing 
with high-dimensional spaces. However, the transition probability density tensor is sparse, 
with the sparsity determined by the grid spacing, grid size and signal model noise. Likewise, 
the correction due to measurements is going to be sparse. Then, the conditional probability 
density is significant only in a small fraction of the grid space. Therefore, the prediction 
step needs to be carried out in a very small region of the grid space. This implies that the 
memory requirements and the number of flops is considerably less than that suggested by 
naive counting. 

The arguments in Section || carry over to this well. In particular, naive flops and 

memory estimate are unduly pessimistic for excellent performance using SGF. 

6.3 Some Remarks on Particle Filtering 

An alternative to grid based techniques is particle filtering (see, for instance, [16]). A re- 
cursive Bayesian filter is obtained by Monte Carlo simulations. The idea is to represent the 
required conditional probability density by a set of random samples with associated weights 
and then to compute estimates based on these samples and weights. As the number of sam- 
ples becomes very large, this approaches the optimal Bayesian solution. It has been shown 
that the performance of a PF is very good for many cases. An important aspect of particle 
filtering is a good choice of the proposal density. For a discrete-time state process with linear 
measurement process, optimal proposal density known for PF. Note that PF solution is not 
robust for the case when the measurement model is non-linear. 




(Correction). 



(Prediction) 
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A common problem with the particle filter is the degeneracy phenomenon, where after 
a few iterations, all but one particle will have negligible weight. In fact, the variance of 
the importance weights can only increase over time, and thus, it is impossible to avoid the 
degeneracy phenomenon. This degeneracy implies that a large computational effort is devoted 
to updating particles whose contribution to the approximation to is almost zero. Methods of 
alleviating this problem include choosing a good importance density and resampling. 

From our discussion, it follows that the sample requirements in PF are analogous to the 
number of grid points in SGF (which is set by the threshold). The key point is that the SGF 
method does not require evaluation at all grid points due to the sparsity of the conditional 
probability density and the transition probability density. 

It is also noted that as long as the grid spacing is small enough to adequately sample 
the conditional probability density, there are no problems in propagating the conditional 
probability density. 

The SGF approach is likely to be useful in providing an estimate of the number of particles 
needed for the particle filtering. It is an interesting exercise to compare the computational 
load of the SGF with that of a robust PF solution (which is problem dependent). 

7. Example 

In the examples, use is made of the Tensor toolbox in MATLAB developed by Bader and 
Kolda [17]. It has the multininear sparse tensor class, essential for SGF. 

Consider the 2D coordinated turn model with oo = vr/SO with hy = 1 and time step size 
T is 0.5 s. The measurement model drift is h{x) = tan~^(x2/xi), or the angle, with noise 
variance (vr/SO rad)^. Thus, the measurement model is manifestly nonlinear function of the 
state variables. The transition probability density was computed in the interval [—25, 25] with 
grid spacing Ax = 1 along xi^2 using the Tensor toolbox in MATLAB [18]. Note that the 
Courant number 2hyT / /S.x^ = 1 so that a one-step explicit scheme is unstable (an implicit 
scheme such as the Crank-Nicholson scheme is stable, but not as accurate [19]). 

Finally, the sparsity of the grid is about 99% when the threshold is set to 10 (i.e., of 
exponent is greater than 10, the transition probability multilinear array element is set to be 
0). The sparsity of P{t" , x"\t' , x') tensor is crucial in enabling a real-time computation of the 
p{t,x\Y) on a current PC. In addition, it places considerably less memeory requirements. 

Figures [l| and |2| plot that the result obtained used the formula for the fundamental 
solution and using SGF. Also plotted are the conditional means and a bounds using the 
conditional probability density. It is seen that the tracking performance is very good in the 
sense that the conditional mean is found to be within a a most of the time. The root mean 
square errors for the two states over 100 Monte Carlo runs is found to be around 3 at t = 60. 
Observe that the transition probability denisty tensor needed to be computed only once. 

In summary, proposed SGF method is seen to be stable and accurate even for such large 
time steps; the errors introuced in sprasifying the transition probability density are seen to 
be minimal, while the computational savings (in terms of speed and memory) are immense. 
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Figure 1: A sample of a the Xi state process, its conditional mean {xi) and standard deviation tJi. 



8. Conclusion 

In this paper it has been shown that the continuous-discrete filtering problem with an affine, 
linear state model and with state independent diffusion matrix can be solved accurately 
using the exact fundamental solution of the corresponding FPKfe valid for an arbitrary time 
step size. Unlike the continuous-continuous case studied by Stephen Yau and collaborators, 
there are no restrictions on the measurement model in the continuous-discrete case. The 
sparsity of the transition probability density is then exploited to demonstrate that sparse 
grid filtering techniques can be used to solve higher dimensional problems with considerably 
less computational effort. Similar comments apply to the discrete-discrete filtering problem 
as well. 

It is also interesting to note that the path integral approach can be used to derive the 
expression for the fundamental solution for the FPKfe for the nonlinear state model case. 
Results for the additive and multiplicative noise cases are derived in [20] and [21]. In fact, it 
has been shown that the curdest approximation of the path integral formulas leads to very 
accurate results (see [22]). Thus, the sparse grid filtering approach yields a unified approach 
to tackling the general nonlinear filtering problem. Since it enables one to focus computation 
in regions of significant probability mass, it offers the possibility of solving higher dimensional 
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Figure 2: A sample of a the X2 state process, its conditional mean {X2) and standard deviation tT2- 

filtering problems in real-time. 

A. General Time Dependent Case 

The state transition matrix is the solution of the following equation: 

U{to,to) = I. 

The results derived in Section ^ and ^ are not valid when the time-dependent matrix, F{t), 
does not commute at different times (note that [F{t), F{t)] = 0, i.e., it commutes at equal 
times). 

A formal solution for the general time-dependent case can be written down using the con- 
cept of time-ordered product. Let Fi{ti),i = 1, . . . ,r be a string of time-dependent matrices 
and let h < t2 < tn- The, the time-ordered product 

T{Fi^{U^)F,^{U^) ■ --FniUj) = Fi{ti)F2{t2) ■ • (A.2) 

In other words, the time-ordered product of a string of operators (with time as its argu- 
ment) is defined to be the string rearranged so that the eariler time operators are to the right 
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of the later ones. In general, there is an ambiguity when the operators do not commute at 
equal times. In the case of interest, Fi{t) = F{t), so they commute at equal times. 

Observe that under the time-ordering symbol, everything commutes. Hence, a time 
derivative can be taken in the usual manner: 

±T (e^o '^^^(^)) = T [F{t)J'o '^^^(^)) , (A.3) 



Here the exponential of the matrix is defined via the usual power series expansion of the 
exponential function. In the second step, use is made of the fact that t is the latest time, so 
that time-ordering puts it on the leftist. Since t is the latest time, it can be taken out of the 
time-ordering: 

|r (e^o '^-^(^)) = Fit)T (e^4 '^^^(^)) . (A.4) 



Thus, the solution of Equation [A.l| is 

?7(t,to)=T(e^4'^"''(^)), t>t'. (A.5) 

The satisfaction of the boundary condition is obvious. Also, uniqueness of the solution follows 
because it is the solution of a first-order differential equation with a given initial value. 
An alternative form is the following: 

r't ft\ ft ft\ ft2 



U{t,to) = l+ dtiF{ti)+ dti dt2F{ti)F{t2) + dti dt2 dt3F{ti)F{t2)F{t3) + 

(A.6) 



The correctness of the formula follows from observing that when the right hand side is differ- 
entiated with respect to time, each term gives the previous one times F{t). Observe that the 
various factors of F{t) are time-ordered, i.e., matrices on later time are at the left. In fact, 
equality of this expression to Equation A.5 follows from the following identity: 

rt Hi rtn-i 1 rt 



dti dt2--- dtnF{h)F{t2)---F{tn) = — dtldt2 ■ ■ ■ dtnT {F{tl)F{t2) ■ ■ ■ F{tn)} . 

Jto Jto Jto Jto 

(A.7) 



Equation A^ is referred to as Dyson's formula and plays a central role in perturbative 
calculations in quantum mechanics and quantum field theory. 



B. Application: Maneuvering Target Tracking Signal Models 

In this section, we summarize the results for many of the linear models that arise in ma- 
neuvering target tracking problems. For a nice, up-to-date review, the reader is referred 
to [12]. Since the models are not time-independent, we can express results in terms of the 
time difference, or equivalently, set the initial time to 0. 
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B.l Nilpotent or Orthogonal Matrix 

The matrix exponential function can be explicitly written in a closed form using the power 
series method for the following two cases: 

• Nilpotent F, i.e., = for some positive integer r; 

• Skew-symmetric F, i.e., F^ = —F. 

The simplest example is the white noise acceleration model is 
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The state model for the Wiener process acceleration model is 
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and the corresponding results are 
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Another state model with a nilpotent drift matrix is 
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(B.l) 



(B.2) 



(B.3) 



(B.4) 



(B.5) 



which becomes the two-dimensional constant velocity model when the fifth state variable is 
ignored. In this case. 
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(B.6) 
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All these examples correspond to the nilpotent case. 

The coordinated turn state model is the simplest state model with skew-symmetric drift 
matrix: 
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The results in this case are 
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(B.8) 
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The three-dimesional skew-symmetric drift matrix leads to the nearly constant turn state 
model 



(B.IO) 



The exponential can be evaluated by using the Cayley-Hamilton theorem which states that 
a matrix satisfies its own charactersitic equation. The characteristic equation for O is 
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This implies that 



= -uf^, or = 



where 



= 



(B.12) 



Therefore, all powers of O are related to the first and second powers. Prom this we see that 
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73 — Osinwf -I- — cosa;t). 
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This is known as Rodrigues' formula. Clearly, when cj = 0, e 



-Qt 



I3. Since 



LOi — LO LO1LO2 U)lU)^ 



(B.14) 



we finally obtain 



[/(t,0) = 



and 
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—0)3 sin + 0)20)1(1 — cos a;f) cos + 0)2(1 — cos wt) 0)1 sin o;i + 0)20)3(1 — cos o;i) 
0)2 sin ut + 0)30)1 ( 1 — cos ut) —ui sin ut + 0)30)2 ( 1 — cos ut) cos o;f + 0)3 ( 1 — cos 0;) 

(B.15) 



h{t) = - B{t)ldt, 
Jt' 

= iA{t") - A{t'))l, 



(B.16) 



where 
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B.2 Other Cases 
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For the general case, a systematic and elegant method for computing the exponential of an 
arbitrary matrix is presented in [13]. A particularly attractive feature is that explicit formulas 
can be systematically and elegantly derived for an (up to) arbitrary 4x4 matrix F in terms 
of F and its eigenvalues. It requires the determination of the minimial polynomial and the 
characteristic polynomial^. Of course, this technique could also have been applied to the 
orthogonal and nilpotent F cases studied in the previous section. 
The constant turn model with known turn rate is defined as 



^Recall that any matrix satisfies its characteristic equation. The minimal polynomial, M{x), is the monic 
polynomial of least degree such that M{F) = 0. The minimal polynomial always divides the characterisitic 
polynomial. Since F1F2 =0 docs not imply _Fi = and/or F2 — for arbitrary matrices Fi and F2, the 
minimal polynomial needs to be determined by explicit matrix multiplication. 
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The eigenvalues of the drift matrix are 0, 0, +iu;, —iu}, and the minimal polynomial is A(A — 
iLo){X + iu) and it follows that 
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In the constant turn model with unknown turn rate, lo is usually described by a Wiener 
process model. Note that the resulting model is nonlinear: 
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Note that the state model for the 2D essentially constant speed/nearly coordinated turn 
modelis 
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which is related to the constant turn model with known turn rate via a permutation of state 
variables. 

The Singer model assumes a zero-mean first-order Markov model for the acceleration: 
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The minimal polynomial is the same as the characteristic polynomial. The exponential is^ 

U{t,Q)=A2{t)F^ + Ai{t)F + h, (B.23) 

1 Ai{t) A2{t) 
1 Ai{t)-aA2{t) 
A2{t) - aAi{t) + I 



''For simplicity, in the remainder of the section, only the expression for U{t, 0) will be written down; the 
expressions for n and E follow from it straightforwardly. 
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where 

A2{t) = ^ (e-"* -1 + at), 
Ai{t)=t. 

A modified verison of the Singer model is the mean-adaptive acceleration model 



(B.24) 
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which can be solved easily using the expression for U{t, 0). 
The state model 
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is referred to as the second planar constant turn model. As the eigenvalues are distinct 
(0, ±io), the minimal polynomial is the characteristic polynomial. In this case 
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The state model 
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refers to the Markovian jump mean acceleration model. Since the eigenvalues are all 
(0, —a, —(3), minimal polynomial is characteristic polynomial. The exponential of F 

by 

U{t,0) = A2{t)F^ + A^{t)F + h, 

"1 Ai{t)-pA2it) A2{t) 
1 + i3'^A2{t) - pAi{t) Ai{t) - (a + /3)^2(0 
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The planar variable turn model is 
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Assuming (a/2) ^ /3, the eigenvalues are 0,ai,a2, where ai^2 = ^ \J \ — Since the 
eigenvalues are distinct, the characteristic polynomial is the minimal polynomial and one 
obtains 
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The state model for a certain class of oscillatory targets is 
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Let the eigenvalues be Ai,A2, A2. The expression for U{t,0) is 
1 
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The Markov acceleration model for constant turns 
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also arises for a class of oscillatory targets. The characteristic equation for F is 

A^ [A(A + /3) + a] = 0, (B.36) 
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so that the eigenvalues are 0, 0, ai, 02 where 01^2 = ~^ ^ y ^ ~ P- The minimal polynomial 

is found to be identical to the characteristic polynomial. From a theorem in [13] , one obtains 

U{t, 0) = (1 + tF) + [h{t){F - ai) + f2{t){F - as)] , (B.37) 

where 

£!!Ml±^M), (B.38, 
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